45 research outputs found

    A Self-Organizing Algorithm for Modeling Protein Loops

    Get PDF
    Protein loops, the flexible short segments connecting two stable secondary structural units in proteins, play a critical role in protein structure and function. Constructing chemically sensible conformations of protein loops that seamlessly bridge the gap between the anchor points without introducing any steric collisions remains an open challenge. A variety of algorithms have been developed to tackle the loop closure problem, ranging from inverse kinematics to knowledge-based approaches that utilize pre-existing fragments extracted from known protein structures. However, many of these approaches focus on the generation of conformations that mainly satisfy the fixed end point condition, leaving the steric constraints to be resolved in subsequent post-processing steps. In the present work, we describe a simple solution that simultaneously satisfies not only the end point and steric conditions, but also chirality and planarity constraints. Starting from random initial atomic coordinates, each individual conformation is generated independently by using a simple alternating scheme of pairwise distance adjustments of randomly chosen atoms, followed by fast geometric matching of the conformationally rigid components of the constituent amino acids. The method is conceptually simple, numerically stable and computationally efficient. Very importantly, additional constraints, such as those derived from NMR experiments, hydrogen bonds or salt bridges, can be incorporated into the algorithm in a straightforward and inexpensive way, making the method ideal for solving more complex multi-loop problems. The remarkable performance and robustness of the algorithm are demonstrated on a set of protein loops of length 4, 8, and 12 that have been used in previous studies

    Amino acid "little Big Bang": Representing amino acid substitution matrices as dot products of Euclidian vectors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices.</p> <p>Results</p> <p>We present a Euclidian vector representation of the amino acids, obtained by the singular value decomposition of the substitution matrices. The substitution matrix entries correspond to the dot product of amino acid vectors. We apply this vector encoding to the study of the relative importance of various amino acid physicochemical properties upon the substitution matrices. We also characterize and compare the PAM and BLOSUM series substitution matrices.</p> <p>Conclusions</p> <p>This vector encoding introduces a Euclidian metric in the amino acid space, consistent with substitution matrices. Such a numerical description of the amino acid is useful when intrinsic properties of amino acids are necessary, for instance, building sequence profiles or finding consensus sequences, using machine learning algorithms such as Support Vector Machine and Neural Networks algorithms.</p

    HiTSEE KNIME: a visualization tool for hit selection and analysis in high-throughput screening experiments for the KNIME platform

    Get PDF
    We present HiTSEE (High-Throughput Screening Exploration Environment), a visualization tool for the analysis of large chemical screens used to examine biochemical processes. The tool supports the investigation of structure-activity relationships (SAR analysis) and, through a flexible interaction mechanism, the navigation of large chemical spaces. Our approach is based on the projection of one or a few molecules of interest and the expansion around their neighborhood and allows for the exploration of large chemical libraries without the need to create an all encompassing overview of the whole library. We describe the requirements we collected during our collaboration with biologists and chemists, the design rationale behind the tool, and two case studies on different datasets. The described integration (HiTSEE KNIME) into the KNIME platform allows additional flexibility in adopting our approach to a wide range of different biochemical problems and enables other research groups to use HiTSEE

    CheS-Mapper - Chemical Space Mapping and Visualization in 3D

    Get PDF
    Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kind of features, like structural fragments as well as quantitative chemical descriptors. These features can be highlighted within CheS-Mapper, which aids the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. As a final function, the tool can also be used to select and export specific subsets of a given dataset for further analysis

    Spike pattern recognition by supervised classification in low dimensional embedding space

    Get PDF
    © The Author(s) 2016. This article is published with open access at Springerlink.com under the terms of the Creative Commons Attribution License 4.0, (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.Epileptiform discharges in interictal electroencephalography (EEG) form the mainstay of epilepsy diagnosis and localization of seizure onset. Visual analysis is rater-dependent and time consuming, especially for long-term recordings, while computerized methods can provide efficiency in reviewing long EEG recordings. This paper presents a machine learning approach for automated detection of epileptiform discharges (spikes). The proposed method first detects spike patterns by calculating similarity to a coarse shape model of a spike waveform and then refines the results by identifying subtle differences between actual spikes and false detections. Pattern classification is performed using support vector machines in a low dimensional space on which the original waveforms are embedded by locality preserving projections. The automatic detection results are compared to experts’ manual annotations (101 spikes) on a whole-night sleep EEG recording. The high sensitivity (97 %) and the low false positive rate (0.1 min−1), calculated by intra-patient cross-validation, highlight the potential of the method for automated interictal EEG assessment.Peer reviewedFinal Published versio

    DPRESS: Localizing estimates of predictive uncertainty

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object <it>u</it>: the standard error of prediction <it>s</it><sub>u </sub>can be estimated as the non-cross-validated error <it>s</it><sub>t* </sub>for the closest object <it>t</it>* in the training set adjusted for its separation <it>d </it>from <it>u </it>in the descriptor space relative to the size of the training set.</p> <p><display-formula><graphic file="1758-2946-1-11-i1.gif"/></display-formula></p> <p>The predictive uncertainty factor <it>γ</it><sub>t* </sub>is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: <it>D</it>istributed <it>PR</it>edictive <it>E</it>rror <it>S</it>um of <it>S</it>quares (DPRESS). Note that <it>s</it><sub>t* </sub>and <it>γ</it><sub>t*</sub>are characteristic of each training set compound contributing to the model of interest.</p> <p>Results</p> <p>The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (<it>N </it>= 75) drawn from a large (<it>N </it>= 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so.</p> <p>Conclusion</p> <p>DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, <it>a posteriori </it>approach to defining applicability domains in terms of localized uncertainty.</p

    Structure-based classification and ontology in chemistry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving <it>relevant </it>results from the available information, and <it>organising </it>those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies.</p> <p>Results</p> <p>We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches.</p> <p>Conclusion</p> <p>Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.</p

    Natural products in modern life science

    Get PDF
    With a realistic threat against biodiversity in rain forests and in the sea, a sustainable use of natural products is becoming more and more important. Basic research directed against different organisms in Nature could reveal unexpected insights into fundamental biological mechanisms but also new pharmaceutical or biotechnological possibilities of more immediate use. Many different strategies have been used prospecting the biodiversity of Earth in the search for novel structure–activity relationships, which has resulted in important discoveries in drug development. However, we believe that the development of multidisciplinary incentives will be necessary for a future successful exploration of Nature. With this aim, one way would be a modernization and renewal of a venerable proven interdisciplinary science, Pharmacognosy, which represents an integrated way of studying biological systems. This has been demonstrated based on an explanatory model where the different parts of the model are explained by our ongoing research. Anti-inflammatory natural products have been discovered based on ethnopharmacological observations, marine sponges in cold water have resulted in substances with ecological impact, combinatory strategy of ecology and chemistry has revealed new insights into the biodiversity of fungi, in depth studies of cyclic peptides (cyclotides) has created new possibilities for engineering of bioactive peptides, development of new strategies using phylogeny and chemography has resulted in new possibilities for navigating chemical and biological space, and using bioinformatic tools for understanding of lateral gene transfer could provide potential drug targets. A multidisciplinary subject like Pharmacognosy, one of several scientific disciplines bridging biology and chemistry with medicine, has a strategic position for studies of complex scientific questions based on observations in Nature. Furthermore, natural product research based on intriguing scientific questions in Nature can be of value to increase the attraction for young students in modern life science

    The ABCD of data management

    No full text
    corecore